Development and assessment of an RNA editing-based risk model for the prognosis of cervical cancer patients

RNA editing, as an epigenetic mechanism, exhibits a strong correlation with the occurrence and development of cancers. Nevertheless, few studies have been conducted to investigate the impact of RNA editing on cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC). In order to study the connection between RNA editing and CESC patients’ prognoses, we obtained CESC-related information from The Cancer Genome Atlas (TCGA) database and randomly allocated the patients into the training group or testing group. An RNA editing-based risk model for CESC patients was established by Cox regression analysis and least absolute shrinkage and selection operator (LASSO). According to the median score generated by this RNA editing-based risk model, patients were categorized into subgroups with high and low risks. We further constructed the nomogram by risk scores and clinical characteristics and analyzed the impact of RNA editing levels on host gene expression levels and adenosine deaminase acting on RNA. Finally, we also compared the biological functions and pathways of differentially expressed genes (DEGs) between different subgroups by enrichment analysis. In this risk model, we screened out 6 RNA editing sites with significant prognostic value. The constructed nomogram performed well in forecasting patients’ prognoses. Furthermore, the level of RNA editing at the prognostic site exhibited a strong correlation with host gene expression. In the high-risk subgroup, we observed multiple biological functions and pathways associated with immune response, cell proliferation, and tumor progression. This study establishes an RNA editing-based risk model that helps forecast patients’ prognoses and offers a new understanding of the underlying mechanism of RNA editing in CESC.


Introduction
Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) are among the most prevalent types of cancer affecting women globally, ranking fourth in both incidence and mortality. [1]Squamous cell carcinoma and adenocarcinoma are the predominant histological subtypes in CESC, comprising around 70% and 25% of CESC cases, respectively. [2,3]CESC develops primarily due to continued human papillomavirus (HPV) infection, [4,5] and thus is often prevented through HPV screening and vaccination. [3,6,7]0] This has some detrimental implications for strategies based on HPV screening and vaccination to prevent CESC.In clinical practice, treatment methods for CESC patients mainly involve hysterectomy, chemotherapy, radiotherapy or adjuvant therapy. [3]Although patients with early-stage CESC have relatively good outcomes after treatment, those with advanced, metastatic, or recurrent CESC have poor therapeutic effects. [11,12][15] Similarly, we can find potential and important biomarkers as prognostic indicators to better prevent the development, metastasis and recurrence of CESC.
RNA editing, which occurs as a post-or co-transcriptional modification, alters specific RNA sequences and enhances the multiformity of transcriptome and proteome. [16]Besides, RNA editing is mainly divided into 2 primary forms, namely adenosine to inosine (A-to-I) and cytidine to uracil (C-to-U).Among these, A-to-I RNA editing is more prevalent and catalyzed by enzymes known as adenosine deaminases acting on RNA (ADARs). [17,18]RNA editing can affect the stability, structure, and function of RNA, thereby impacting protein synthesis and regulating protein function. [19,20]Therefore, due to its capacity to instruct precise modifications in protein function and its underlying mechanisms, RNA editing is considered a well-positioned tool for adapting to the environment. [20]s sequencing technology continues to evolve, the significance of RNA editing in cancers has been realized by an increasing number of studies.The study has discovered that some RNA editing sites are associated with tumors, and RNA editing events affect drug sensitivity. [21]Moreover, RNA editing is able to influence the proliferation, metastasis, and immune response in tumor cell, thereby further impacting both malignant tumor progression and treatment response. [22,23]Compared to the matched normal tissue, there is an elevation in RNA editing activity across a majority of tumor types. [24]In order to assist tumor cells in adapting to different microenvironments and disease states, RNA editing levels may dynamically alter due to microenvironmental factors or tumor progression. [19,21,25,26][29] However, there are no studies on the construction of prognostic models for CESC by utilizing RNA editing as a factor with predictive value, and the role of RNA editing in the prognosis of CESC has not been fully explored.
Based on our study, prognosis-related RNA editing sites in CESC were first determined to establish and evaluate corresponding prognostic models.The nomogram used to forecast patients' prognoses was further established by RNA editing-based risk score and clinical characteristics, and its prognostic ability was verified.Finally, the underlying mechanisms of RNA editing sites in CESC were also explored.

Data acquisition
We obtained clinical information and RNA-seq data for CESC patients from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/).The corresponding RNA editing profiles of CESC patients were acquired from the synapse website (https://www.synapse.org/#!Synapse:syn4382383). [21] In addition, we removed cases with incomplete clinical information and RNA editing sites with missing editing levels in over 30% of samples.After processing the dataset, the TCGA dataset was divided into a training group and a testing group according to a 1:1 ratio.The TCGA training group was adapted to establish a risk model based on RNA editing, while the TCGA testing group was used to verify the prognostic capability of model.

Establishment of RNA editing-based risk model
To screen for sites with potential prognostic value, univariate Cox regression analysis was performed on overall survival (OS) in the training group.The detailed chromosomal locations of these sites were presented in the Manhattan plot through the "CMplot" package.Afterwards, LASSO regression analysis was performed on the screened RNA editing sites to avoid model overfitting by the "glmnet" package. [30]urthermore, we used multivariate Cox regression analysis to select the sites with prominent prognostic value by stepwise regression method. [31]According to the final multivariate Cox model, we calculated risk scores by using the following method: where Exp i represented the editing level of sites, and Coe i represented the multivariate Cox regression coefficient of corresponding sites.

Validation of RNA editing-based risk model
Patients were classified to either the subgroup with high-risk or the subgroup with low-risk by comparing the individual risk scores with the calculated median risk score.Afterwards, scatter plots were applied to study the connection between survival status and RNA editing-based risk scores, and heat maps were utilized to show the editing level of prognostic sites among different risk subgroups.Kaplan-Meier analysis was employed to compare OS and progression-free survival (PFS) between different subgroups by "survminer" and "survival" packages.

Relationship between risk scores and various clinicopathological characteristics
In order to investigate the association between risk scores and various clinical characteristics, Student t test was employed to compare the differences in risk scores among different ages, tumor stages, and tumor grades.Also, we conducted univariate and multivariate Cox regression analyses on both RNA editing-based scores and clinical features to evaluate whether this score could be served as an independent predictive indicator.

Establishment and assessment of prognostic nomogram
We combined RNA editing-based risk scores and basic clinical features to build the nomogram by means of "rms" package, [32] which can be used as the quantitative tool for forecasting patients' prognoses in real-world clinical settings.Afterwards, the predictive performance of nomogram was assessed through various methods including calibration curves, concordance index (C-index), and receiver operating characteristic (ROC) curves.Finally, to determine whether the nomogram can be of practical value in clinical decision-making, we conducted a decision curve analysis. [33]

Correlation between RNA editing site and host gene expression
By using RNA editing data and RNA-seq data, we investigated the correlation between the level of RNA editing at the prognostic site and host gene expression.Since the ADAR family of enzymes was the key enzyme used in RNA editing, [34] the association of risk scores with ADAR expression level was also investigated.

Enrichment analysis of differentially expressed genes (DEGs)
We employed a threshold of |log2 FC| > 1 and P < .05 to identify DEGs between different subgroups by using "limma" package.
Afterwards, Gene ontology (GO) enrichment analysis was applied to explore the biological properties of DEGs, including the biological process (BP), cellular component (CC) and molecular function (MF).Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was also performed to find pathways associated with DEGs. [35]Based on the "c2.cp.kegg.symbols.gmt"and "c5.go.symbols.gmt"gene sets in the Molecular Signatures Database, we performed a gene set enrichment analysis (GSEA) on the whole genome to study the potential mechanism by which RNA editing site may exert their effects in cervical carcinoma. [36,37]

Statistical analysis
We utilized the chi-squared test to assess whether there were differences in the proportions of patients with distinct clinical features between the training and testing groups.The relationship between the 2 factors was explored by Spearman correlation analysis.In this research, we carried out statistical analyses by making use of R software (version 4.0.5), and statistical significance was defined as P < .05.

Establishment and validation of RNA editing-based risk model
The flow chart of our research is displayed below (Fig. 1).To construct and verify the prognostic model, we obtained 196 samples with RNA editing data and valid clinical information.These samples were then randomly split into the training and testing groups in equal proportion.Table 1 shows the baseline features of 2 groups.From the table, we observed no statistically significant differences in all clinical features between the training and testing groups.
In order to establish an RNA editing-based risk model for forecasting CESC patients' prognoses, we conducted univariate analysis on all RNA editing sites to initially screen the sites related to prognosis in the training group.By univariate analysis, we confirmed 16 RNA editing sites that demonstrated a strong association with patients' OS (P < .001).Then, the Manhattan plot demonstrated these prognosis-related sites and their chromosomal locations (Fig. 2A).To avoid model overfitting, the above 16 RNA editing sites were moved into LASSO regression analysis, resulting in the identification of 11 sites with prognostic value based on the least partial likelihood deviation (Fig. 2B).Subsequently, 11 sites were further included in multivariate analysis, and finally 6 sites with significantly prognostic value were obtained.These 6 sites and corresponding regression coefficients were employed to build the following prognostic risk model: To test the accuracy and reliability of the risk model, the differences in survival status and editing levels of 6 prognostic sites among different risk populations were analyzed in the training group, testing group, and entire dataset.The findings indicated that low-risk patients had a longer survival time and a higher proportion of survival than high-risk patients across 3 datasets (Fig. 3A-F).The heatmap revealed that the editing level of ASB16 − AS1|chr17:42259075 generally showed a decreasing trend with increasing risk scores, while the editing levels of the other 5 sites showed the opposite trend (Fig. 3G-I).Moreover, the outcome of the survival analysis revealed that low-risk patients had a better prognosis than high-risk patients across the training group, testing group, and entire dataset (Fig. 3J-L).In the entire dataset, low-risk patients possessed a higher probability of PFS than patients with high-risk (Fig. 3M).The results above revealed that this risk model possessed good prognostic ability.

Relationship between risk score and different clinical features
We analyzed the relationship between risk scores and different clinical features by using the entire dataset.The results displayed no significant differences in risk scores between different age groups, tumor stage groups, and tumor grade groups (P > .05,Fig. 4).We assessed the independent predictive ability of risk score and different clinical characteristics by making use of univariate and multivariate Cox analyses (Table 2).According to the outcomes derived from univariate analysis, we found that risk scores were significantly correlated with OS in the entire dataset (P < .001,HR = 1.009 (95%CI: 1.006 − 1.012)).The result of further multivariate analysis revealed that risk score was an independent predictor for CESC patients (P < .001,HR = 1.009 (95%CI: 1.006 − 1.013)), which possessed significant value in forecasting patients' prognoses.

Construction and validation of RNA editing-based nomogram
According to RNA editing-based risk scores and clinical characteristics, the nomogram was established to improve the value of the prognostic model in clinical practice (Fig. 5A).By utilizing the nomogram, we could make predictions for 1-, 2-and 3-year survival probabilities of CESC patients.Calibration plot revealed the strong concordance between predicted and observed values (Fig. 5B).Across the entire dataset, the C-index was 0.844 (95% CI: 0.811 − 0.876), which further validated the good prognostic ability of the nomogram.Besides, ROC curves showed that risk scores (area under the curve [AUC] = 0.809) and the nomogram (AUC = 0.765) had higher area under the curve (AUC) values (Fig. 5C).This result revealed that the prognostic accuracies of risk scores and the nomogram were better than other clinical features.Finally, results of decision curve analysis indicated that the nomogram possessed relatively high clinical value (Fig. 5D).

Identification of DEGs and functional enrichment analysis
According to pre-set thresholds, we first carried out differential gene expression analysis to explore the biological functions and pathways of DEGs between different subgroups.The result indicated that 64 up-regulated and 136 down-regulated DEGs were recognized in the high-risk subgroup (Fig. 7A).Moreover, the expression of the top 50 significantly up-and downregulated DEGs was shown by heatmap (Fig. 7B).BP analysis in GO enrichment analysis suggested that DEGs primarily participated in immune-related biological processes, including immune cell proliferation, regulation of immune cell proliferation, amide transport, and humoral immune response (Fig. 7C).The result of CC analysis revealed that DEGs were mainly enriched in cluster of actin − based cell projections, apical plasma membrane and apical part of cell (Fig. 7C).MF analysis presented that DEGs were enriched in various molecular activities and molecular receptor binding like receptor ligand activity, signaling receptor activator activity and cytokine receptor binding (Fig. 7C).
Afterwards, the result of KEGG analysis displayed that DEGs were related to cytokine − cytokine receptor interaction, MAPK signaling pathway, IL − 17 signaling pathway, NF − kappa B signaling pathway, and TNF signaling pathway (Fig. 7D).

Gene set enrichment analysis
To further compare the biological functions and enriched pathways of DEGs among different risk subgroups, we conducted GSEA on the whole genome.Through the analysis, we found that DEGs within the low-risk subgroup were significantly associated with biological functions such as cellular response to xenobiotic stimulus, xenobiotic metabolic process, brush border, brush border membrane and the cluster of actin-based cell projection (Fig. 8A).As shown in Figure 8B, biological functions such as response to fungus, tissue migration, cellsubstrate junction and collagen binding were up-regulated in the subgroup with high-risk.Besides, DEGs in the subgroup with low-risk were mainly enriched in metabolism-related pathways (Fig. 8C).In contrast, DEGs in the high-risk subgroup were mainly enriched in cytokine-cytokine receptor interaction, ECM receptor interaction, regulation of actin cytoskeleton, focal adhesion and NOD-like receptor signaling pathway (Fig. 8D).

Discussion
Cervical cancer is a prevalent malignancy affecting the female reproductive system, with relatively high morbidity and mortality worldwide.Despite the success of the HPV prophylactic vaccine in reducing morbidity, the exact mechanisms of viral infection remain unclear.Furthermore, complex genetic and epigenetic alterations in host cell genes play a critical role in the progression of cervical precancerous lesions to invasive cancer. [38,39]RNA editing, serving as an epigenetic mechanism, is closely connected to the occurrence and development of tumors. [40]Also, RNA editing is one of the most prevalent genetic regulation methods in normal physiological processes.By altering the RNA sequences, it makes them different from the corresponding template genomic DNA, which influences the diversity and complexity of biomolecules. [16,19,41]Because RNA editing affects RNA levels, RNA localization, alternative splicing, translation efficiency, and amino acid sequences of proteins, dysregulation of its editing process plays a key role in cancers. [40,42,43]Therefore, we considered constructing a risk model through RNA editing to assess patients' prognoses.
In the study, we finally identified 6 RNA editing sites as the best prognostic indicators of CESC through Cox and LASSO regression analyses.Then, the RNA editing-based risk model was established to stratify CESC patients with different prognoses.RNA editing can regulate physiological and pathological processes by influencing host gene expression.The relevant host genes in the risk model are all associated with cancers.APOL1 encodes a secreted high-density lipoprotein, which has been regarded as an aberrantly expressed gene in multiple cancers. [44,45]ome studies found that suppression of ICMT resulted in induction of autophagy, inhibition of cell growth and inhibition of proliferation in various cancer cell types. [46,47]Since PRR11 acts as an oncogenic factor in cellular proliferation, migration, invasion, cell-cycle progression, apoptosis and autophagy in some cancers, it is considered a promising prognostic biomarker. [48]It has been reported that ASB16-AS1 is an oncogenic lncRNA in CESC, which affects the proliferation, migration and invasion of CESC cells by targeting miR-1305. [49]Moreover, the study has shown that elevated levels of PPP1R13L can increase tumorigenesis and affect the migration of tumor cells. [50]It has been reported that overexpression of ATG14 significantly enhances cell proliferation rate and inhibits apoptosis in colorectal cancer. [51]This evidence supported a functional basis for the connection between these RNA editing sites and patients' prognoses.
It is still not clear how these sites included in the model affect the survival of CESC patients.As reported, RNA editing can regulate gene expression levels by changing RNA sequences, which in turn can affect the translation and function of proteins. [41]In the study, we observed significantly positive correlations between chr19:45896816 and PPP1R13L expression, chr22:36662801 and APOL1 expression, and chr14:55834429 and ATG14 expression.Since mRNA levels may not necessarily represent post-transcriptionally regulated protein levels for genes, the absence of significant correlation between the level of RNA editing at other 3 sites and host gene expression does not necessarily imply that these editing sites have no effect on host gene expression.Further analysis at the protein level is necessary to determine whether these sites may influence the prognosis of CESC patients by regulating host gene expression.Besides, ADAR, as the key regulator of RNA editing, is closely linked to RNA editing.Furthermore, dysregulations of ADAR expression and RNA editing are common in diverse human diseases. [52,53]Our research also discovered a significant positive association between risk scores and ADAR expression levels.Furthermore, we verified that this prognostic model can accurately forecast patients' prognoses by survival analysis and ROC curve.Additionally, the significant and independent prognostic capacity of this risk model was further demonstrated after accounting for age, tumor stage, and tumor grade.To expand the clinical application and usability of RNA editing-based risk signatures, we combined risk score and clinical features to establish the nomogram with a high AUC value.Moreover, the calibration plot further revealed the good prognostic capability of this nomogram with high Harrell C-index as 0.844.By decision curves, the nomogram presented the better net benefit of clinical decisions.Overall, the nomogram showed good performance, accuracy, and stability in predicting OS in CESC.
Besides, we conducted an enrichment analysis to compare the differences in biological functions and pathways of DEGs among distinct risk subgroups.According to the findings of GO enrichment analysis, it could be observed that DEGs in different risk subgroups had significant changes in biological functions such as immune response, signal transduction, cell motility, and cell-cell interactions.These functional changes are intricately related to the proliferation and metastasis of tumor cell.KEGG analysis displayed that DEGs were connected with cytokine − cytokine receptor interaction, MAPK signaling pathway, IL − 17 signaling pathway, NF − kappa B signaling pathway, and TNF signaling pathway.56][57][58] GSEA was employed to further explore the biological function and pathway of DEGs among different risk subgroups.The results displayed that DEGs in the subgroup with highrisk were mainly enriched in cytokine-cytokine receptor interaction, ECM receptor interaction, focal adhesion, NOD-like receptor signaling pathway and regulation of actin cytoskeleton.Cytokine-cytokine receptor interaction is the important regulatory mechanism in physiological processes such as immunity and inflammation, and is strongly linked to the development and progression of cancer. [54,59]Dysregulation of ECM receptor interaction affects tumor progression by promoting cell proliferation, differentiation, migration, and survival. [60,61]Dynamic regulation of focal adhesion and regulation of actin cytoskeleton are key factors in cell migration, exerting significant effects on promoting the invasion of tumor cells. [62]Also, the NOD-like receptor signaling pathway is associated with the immune system and regulates the development of cancer. [63]Overall, we observed multiple biological functions and pathways correlated with immune response, cell proliferation, and tumor progression in the high-risk subgroup.
Currently, some limitations remain in our study.First, only data from CESC patients in the TCGA database were analyzed, lacking independent external data to enhance the credibility of this model.Secondly, because of the limited information about CESC patients provided by the TCGA database, the nomogram built in the study could not include relatively abundant clinical factors.Finally, rigorous experiments are required to further gain a deeper understanding of the intrinsic mechanism by which the identified RNA editing sites function in CESC.

Conclusions
We developed a risk model utilizing RNA editing and verified that it has good prognostic value for CESC patients.By integrating risk scores with clinical factors, the nomogram with good predictive performance was constructed.Through in-depth analysis, we found that RNA editing plays a crucial role in affecting tumor progression.It exerts its influence by modulating the proliferation, metastasis, and immune response of tumor cell within the tumor microenvironment.

Figure 1 .
Figure 1.The workflow chart of this study.

Figure 2 .
Figure 2. Preliminary screening of RNA editing sites related to prognosis.(A) The Manhattan plot showing the chromosomal locations of 16 RNA editing sites screened by univariate Cox regression analysis.(B) LASSO regression: the curve of Partial likelihood deviance changing with Log (λ), the smaller the value, the better the model fitting.LASSO = least absolute shrinkage and selection operator.

Figure 3 .
Figure 3. Assessing the prognostic value of the risk model in the entire dataset, training group, and testing group.(A-C) Distribution of risk score based on RNA editing prognostic model.(D-F) Survival status based on different risk scores.(G-I) Editing levels of 6 RNA editing sites in different risk groups.(J-L) Kaplan-Meier curves showing OS probabilities for different risk groups.(M) Kaplan-Meier curves showing PFS probabilities for different risk groups in the entire dataset.PFS = progression-free survival.

Figure 4 .
Figure 4.The relationship between risk score and different clinical features.(A) Analysis of differences in risk scores across age groups.(B) Analysis of differences in risk scores among different tumor stage groups.(C) Analysis of differences in risk scores between different tumor grade groups.

Figure 5 .
Figure 5. Establishment and evaluation of the nomogram based on risk score and clinical characteristics.(A) The nomogram to predict 1-, 2-, and 3-yr OS probabilities in CESC patients.(B) Calibration curves were applied to assess the 1-, 2-and 3-yr OS predictive accuracy of the nomograms in the entire dataset.(C) ROC curves showed AUC values for risk score, nomogram, age, tumor stage and tumor grade in the entire dataset.(D) Decision curves displayed the comparison of the net benefits of clinical decisions based on risk score, nomogram, age, tumor stage and tumor grade.AUC = area under the curve, CESC = cervical squamous cell carcinoma and endocervical adenocarcinoma, ROC = receiver operating characteristic.

Figure 6 .
Figure 6.Relationship between RNA editing sites in the prognostic model and the expression of host genes.(A-F) Correlation analyses between editing levels of 6 prognostic sites and expression levels of host genes.(G) Correlation analysis of RNA editing-based risk score with ADAR expression.ADARs = adenosine deaminases acting on RNA.

Table 1
Baseline characteristics of patients in the training group and testing group.

Table 2
Univariate and multivariate Cox regression analyses of the prognosis-related factors.